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C*) ' Abstract Considering two independent Poisson processes, we ad- 

dress the question of testing equality of their respective intensities. 
We first propose single tests whose test statistics are [/-statistics 
based on general kernel functions. The corresponding critical val- 
ues are constructed from a non-asymptotic wild bootstrap approach, 
leading to level a tests. Various choices for the kernel functions are 

fS^ possible, including projection, approximation or reproducing kernels. 

In this last case, we obtain a parametric rate of testing for a weak 
metric defined in the RKHS associated with the considered repro- 
ducing kernel. Then we introduce, in the other cases, an aggregation 
procedure, which allows us to import ideas coming from model selec- 

£SJ ' tion, thresholding and/or approximation kernels adaptive estimation. 

^ . The resulting multiple tests are proved to be of level a, and to sat- 

CN ' isfy non-asymptotic oracle type conditions for the classical L2-norm. 

From these conditions, we deduce that they are adaptive in the mini- 
max sense over a large variety of classes of alternatives based on clas- 
sical and weak Besov bodies in the univariate case, but also Sobolev 

C^) . and anisotropic NikoPskii-Besov balls in the multivariate case. 

o 

1. Introduction. We consider the two-sample problem for general Pois- 
son processes. Let N 1 and N^ 1 be two independent Poisson processes ob- 
S^ | served on a measurable space X, whose intensities with respect to some 

non-atomic positive cr-finite measure /JonX are denoted by / and g. Given 
the observation of N 1 and iV -1 , we address the question of testing the null 
hypothesis (Ho) "/ = g" versus the alternative (Hi) "/ / g" . 

Many papers deal with the two-sample problem for homogeneous Poisson 
processes such as, among others, the historical ones of [41], [10], [19], or [48], 
whose applications were mainly turned to biology and medicine, and less 
frequently to reliability. More recent papers like [34], [38], [9], and [8] give 
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2 M. FROMONT ET AL. 

interesting numerical comparisons of various testing procedures. As for non- 
homogeneous Poisson processes, though a lot of references on the problem 
of testing proportionality of the hazard rates of the processes exist (see [14] 
for instance and the references therein), very few papers are devoted to a 
comparison of the intensities themselves. Bovett and Saw [5] and Deshpande 
et al. [15] respectively proposed conditional and unconditional procedures 
to test the null hypothesis "f/g is constant" versus "it is increasing". Desh- 
pande et al. [15] considered their test from a usual asymptotic point of view, 
proving that it is consistent against several large classes of alternatives. 

We propose in this paper to construct testing procedures of (Hq) versus 
(Hi) without any parametric or monotony assumption on / or g and which 
satisfy specific non-asymptotic performance properties. 

In particular, for every a in [0, 1], these tests are of level a, that is they 
have a probability of first kind error at most equal to a. For special values 
of a, they are even of size a, that is their probability of first kind error is 
exactly equal to a, since they involve very sharp critical values obtained 
via a non-asymptotic wild bootstrap approach. In the classical two-sample 
problem for i.i.d. samples, the choice of the critical values in testing proce- 
dures is a well-known crucial question. Indeed, the asymptotic distributions 
of many test statistics are not free from the common unknown density un- 
der the null hypothesis. In such cases, some bootstrap methods are often 
used to build data-driven critical values. By bootstrap methods, we mean 
the original ones introduced by Efron [16] of course, but also more general 
weighted bootstrap approaches such as the precursor Fisher's [17] permu- 
tation, the m out of n bootstrap introduced by Bretagnolle [6], the general 
exchangeably weighted bootstrap studied in [40] and including the Bayesian 
bootstrap of Rubin [46] for instance, as well as the wild bootstrap detailed 
in [37]. Except in the cases where the permutation approach is used, au- 
thors generally prove that the obtained tests are (only) asymptotically of 
level a (see among many other papers [43], [44], [39], and more recently [31] 
for a complete and very interesting discussion). In this work, we adopt one 
of these general weighted bootstrap approaches, but from a non-asymptotic 
point of view. The critical values of our tests are constructed from wild boot- 
strapped ^/-statistics, which are based on Rademacher variables. The use of 
Rademacher variables is well-known in the bootstrap community since the 
work of Mammen [37] , but also particularly in the statistical learning com- 
munity since the works of Koltchinskii [32] and Bartlett et al. [4], followed 
by [33]. It was notably proposed for the construction of general confidence 
bands in a recent paper by Lounici and Nickl [36]. The main particularity 
of our study, as compared with previous ones, is that we prove here that, 
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under (Ho), given the data, the considered wild bootstrapped [/-statistics 
exactly have the same distributions as our test statistics. The corresponding 
tests are consequently of level a for every a in [0,1], and even of size a for 
particular values of a. Note that as in [45] or in [23], it is also possible to 
randomize these tests in order to turn them into size a tests for every a. 
In this sense, our bootstrap method can be viewed as an adapted version of 
the permutation bootstrap method in a Poisson framework. As usual even 
when permutation methods are considered, the wild bootstrapped critical 
values of our tests are not computed exactly in practice, but just approxi- 
mated through a Monte Carlo method. We also address this question from 
a non-asymptotic point of view, since we also focus on controlling the loss 
due to the Monte Carlo approximation. 

Our test statistics are based on a single kernel function which can be 
chosen either as a projection kernel, or as an approximation kernel, or as a 
reproducing kernel. A non-asymptotic study of the second kind error of our 
tests is also performed. Given any /3 in [0, 1], depending on the chosen kernel, 
we obtain non-asymptotic conditions which guarantee that the probability 
of second kind error is at most equal to /?. This can be done via a sharp 
control of the wild bootstrapped critical values under the alternative, which 
results from concentration inequalities for Rademacher chaoses [12, 35]. 

In order to deduce from these conditions recognizable asymptotic rates of 
testing, we assume that the measure \i on X satisfies dfi = ndv, where n can 
be seen as a growing number whereas the measure v is held fixed. Typically, 
n may be an integer and the above assumption amounts to considering the 
Poisson processes A 1 and A -1 as n pooled i.i.d. Poisson processes with re- 
spective intensities / and g w.r.t. v. The reader may also assume for sake of 
simplicity that X is a measurable subset of R and that v is the Lebesgue 
measure, but it is not required: v may be any non-atomic positive cr-finite 
measure on any measurable set X. With this normalization, when a repro- 
ducing kernel is considered, we obtain a parametric rate of testing for a weak 
metric defined in the associated RKHS, in the spirit of [53] or [20] for more 
classical weak metrics in i.i.d. samples frameworks. Our results complete 
those of Gretton et al. [22], who introduced reproducing kernels in the two- 
sample problem for i.i.d. samples. When a projection or an approximation 
kernel is considered, we obtain the following condition: the probability of 
second kind error of the test is at most equal to (3 as soon as the L2-distance 
w.r.t. v between / and g is larger than a bound, which reproduces a bias- 
variance decomposition. This bound can be proved to be optimal with an 
appropriate choice of the vectorial space defining the projection kernel, or 
of the bandwidth defining the approximation kernel, choice which highly 
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depends on the alternative. 

In order to provide an adaptive test with respect to this choice, we pro- 
pose to aggregate several of the previous single kernel-based tests, making 
sure that the resulting multiple test is still of level a. We establish oracle 
type conditions, which guarantee that the probability of second kind error is 
at most equal to j3. This aggregation approach, inspired by adaptive estima- 
tion methods such as model selection, thresholding or approximation kernels 
methods, was used in many papers devoted to adaptive testing in various 
classical one-sample frameworks (see [49] or [50] for adaptive tests related 
to thresholding methods, [27] for adaptive tests related to model selection 
methods, [24] for adaptive tests related to approximation kernels methods, 
or [3] for adaptive tests related to both model selection and thresholding 
methods for instance). In a Poisson process framework, we proposed in [18] 
an aggregated test of homogeneity also based on both model selection and 
thresholding approaches. In the two-sample problem for i.i.d. samples, which 
is closely related to the present problem, Butucea and Tribouley [7] propose 
an adaptive test based on a thresholding approach. 

We complete the study by proving that our aggregated tests are also 
adaptive in a non-asymptotic minimax sense over various classes 5,5 of al- 
ternatives (/, g) for which (/ — g) is smooth with parameter 5. For clarity's 
sake, let us here recall a few definitions. For any level a test $ a , with values 
in {0, 1} (rejecting (Hq) when $ Q = 1), one defines its uniform separation 
rate p($ a ,Ss, /3) over S$ as 

(1.1) p($ a ,S s ,P) = infL>0, sup P/ Ifl ($a = 0) < \ , 

{ (f,g)eSt,\\f-g\\>p ' ) 

where ||/ — <?|| 2 = J(f — g) 2 dv, and Pj >9 denotes the joint distribution of 
(N 1 , iV -1 ). A level a test <I> Q is said to be minimax over a particular class Ss 
if its uniform separation rate achieves its best possible value over Ss, which 
is called the minimax separation rate over Ss (see [2]) up to a multiplicative 
factor. It is said to be minimax adaptive if its uniform separation rates 
achieve (up to a possible unavoidable small loss) the minimax separation 
rates over several classes Ss simultaneously. A great number of papers deal 
with the computation of the minimax separation rates over various classes 
of alternatives, or more precisely with the computation of their asymptotic 
equivalents, that are the minimax rates of testing defined in the key series of 
papers due to Ingster [26]. The question of the minimax adaptivity has also 
been widely studied since the work of Spokoiny [49], who first brought out 
a context where minimax adaptive testing without a small loss of efficiency 
is impossible. For the problem of testing the goodness-of-fit of a Poisson 
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process, Ingster and Kutoyants [28] derived the minimax rate of testing over 
a Sobolev or a Besov ball. For the problem of testing the homogeneity of 
a Poisson process, we derived in [18] similar minimax results considering 
classical Besov bodies, and we moreover obtained new minimax adaptivity 
results considering weak Besov bodies. 

In the present two-sample problem for Poisson processes, no previous min- 
imax result is available to our knowledge. As in [18], we here prove that the 
aggregation of single projection kernel-based tests lead to minimax adaptive 
tests over some classes of alternatives for which (/ — g) belongs to a Besov 
or a weak Besov body. Such a result can be linked to the minimax results 
obtained by Butucea and Tribouley [7], noting however that the classes of 
alternatives they consider impose both / and g to belong to a Besov space, 
which is more restrictive than only imposing some regularity assumptions 
on (/ — g). Then, when considering the aggregation of single approximation 
kernel-based tests, we obtain upper bounds for the uniform separation rates 
over some classes of alternatives based on multivariate Sobolev or anisotropic 
Nikol'skii-Besov balls. These upper bounds, which are conjectured to be op- 
timal from results of Horowitz and Spokoiny [24] or Ingster and Stepanova 
[29] in other frameworks, are completely new in our Poisson setting, and 
even in a general setting for anisotropic Nikol'skii-Besov balls. 

The paper is organized as follows. In Section 2, we introduce our single 
kernel-based tests. As explained above, the corresponding critical values 
are constructed from a wild bootstrap approach, leading to level a single 
tests. We then give conditions ensuring that these single tests also have a 
probability of second kind error at most equal to /3, and we study the cost due 
to the Monte Carlo approximation of the wild bootstrapped critical values. 
In Section 3, we construct level a multiple tests by aggregating several of 
the single tests introduced in Section 2. Oracle type conditions are obtained, 
ensuring that these multiple tests have a probability of second kind error at 
most equal to /?. From these conditions, some of our tests are also proved to 
be minimax adaptive over various classes of alternatives based on classical 
and weak Besov bodies in the univariate case, or Sobolev and anistropic 
Nikol'skii-Besov balls in the multivariate case. The major proofs are given 
in Section 4, whereas a simulation study and the other proofs can be found 
in supplementary materials. 

Let us now introduce some notations that will be used all along the paper. 
For any measurable function h, let when they exist: |/i|oo = sup x ^^\h(x)\, 
and \\h\\i = J x \h(x)\dv x . Recalling that \h\ = (§^h{x) 2 dvx) 1 ' 2 , we introduce 
the scalar product (.,.) associated with |.|. We denote by dN l and dN~ l 
the point measures associated with A 1 and iV -1 respectively, and to suit 
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for the notation Pj i9 of the joint distribution of (N 1 , A" 1 ), Ey j5 stands for 
the corresponding expectation. We set for any event A based on (N 1 , A -1 ), 
F {Ho) (A) = swp {{f!g)>f=g) F f , 9 (A). 

Furthermore, we will introduce some constants, that we do not intend to 
evaluate here, and that are denoted by C(a,/3, . . .) meaning that they may 
depend on a, (3, .... Though they are denoted in the same way, they may 
vary from one line to another. 

Finally, let us make the two following assumptions, which together imply 
that / and g belong to L 2 (X, dv), and which will be satisfied all along the 
paper, except when specified. 

Assumption 1. ||/||i < +oo and \\g\\i < +oo. 

Assumption 2. \\f\\oo < +oo and \\g\\oo < +oo. 

2. Single kernel-based tests with non-asymptotic wild bootstrapped 
critical values. 

2.1. Single kernel-based test statistics. Since / and g are assumed to 
satisfy Assumptions 1 and 2, they are also assumed to belong to L 2 (X, dv). 
Hence, testing (Hq) "/ = g" versus (Hi) "/ / 9" here amounts to testing 
that "||/ — g\\ = 0" versus "||/ — g\\ > 0". Considering a well-chosen finite 
dimensional subspace S of L 2 (X, dv), if Us denotes the orthogonal projection 
onto S for (., .), any estimator of an increasing function of |IIg(/ — g)\\ 2 may 
thus be a relevant candidate to be a test statistic. Let {tp\,\ G A} be an 
orthonormal basis of S for (.,.), and let 

ipxdN 1 - [ cpxdN- 1 ) - [ tpldN) , 




where N is the pooled Poisson process whose point measure is given by dN 
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2 



d^+dN- 1 . Since E [(/^dJV 1 )^ = (fi Px (x)f(x)d f i x y+J^ x (x)f(x)dfi x , 
and similarly for E (J (p\dN~ l ) , recalling that dfi = ndv, it is easy to see 

that T is an unbiased estimator of n 2 |II,s(/ — g)\\ 2 , and thus also a possible 
test statistic, whose large values lead to reject (Ho). 

Let (e x ) X £N he the marks of the points from the pooled process N, defined 
by e x = 1 if the point x of N belongs to N 1 and e^ ^ — 1 if the point x of 
iV belongs to N^ 1 . Then T can also be expressed as 

f = H E vx(x)vx(x')e°A> = E (e^w^^oj^- 

XeAx^x'eN x^x'eN \AgA / 
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Starting from this remark, we can thus generalize the test statistic T by 
replacing in its expression the function: (x, x') G X 2 \- > X^agA ^xi^^xix') e 
M by a general kernel function. So, let K be any symmetric kernel function: 
XxX->K satisfying: 

Assumption 3. J x2 K 2 (x, x')(f + g)(x)(f + g)(x')du x dv x , < +00. 

Denoting by X' 2 J the set {(x,x') G X 2 , x 7^ x'}, we introduce the statistic 

(2.1) f K = V K(x, x')e x e x , = I K(x,x')£ x e x ,dN x dN x r. 

x^tx'£N 

Since for every x in N, K[e x \N] = (f(x) — g(x)) / (f (x) + g(x)) (see Propo- 
sition 1 below for instance), 



®f,g[T K ] = IE/,, 



E 



K(x,x')s x e x ,dN x dN x , 



N 



K[X ' X) f(x)+g(x)f(x') + g(x'f^ d ^' 
K(x, x') (/ - g) (x) (/ - g) (x')d[i x d\i x i 



n I K(x,x')(f - g)(x){f - g)(x')dv x dv x < . 



In the following, we use the notation: 

(2.2) K [p] (x') = [ K{x, x')p(x)du x . 



With this notation, Tk is then an unbiased estimator of 
(2.3) £ K = n 2 (K[f-g]J-g), 

whose existence is ensured thanks to Assumptions 1 and 3. 

We have chosen to consider and study in this paper three possible exam- 
ples of kernel functions. For each example, we give a simpler expression of 
£k, which allows to justify the choice of Tk as test statistic. 

[Projection kernel case] Our first choice for K is a symmetric kernel func- 
tion based on an orthonormal family {ip\, A £ A} for (.,.): 

K(x,x) = ^2ip x (x)tpx(x')- 
AeA 
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When the cardinality of A is finite, Tk corresponds to the above natu- 
ral test statistic T. When the cardinality of A is infinite, we assume that 
sup^ j/gx ]CagA I c ^a(^) ( ^a(^')I < +00, which ensures that K(x,x') is defined 
for all x, x' in X and that Assumption 3 holds. Typically, if X = R and 
if the functions {f\, A € A} correspond to indicator functions with disjoint 
supports, this condition will be satisfied. 

We check in these cases that for every s in L 2 (X, dv), K [s] = lis(s), where 
S is the subspace of L 2 (X, dv) generated by {<p\, A G A}, and II5 denotes as 
above the orthogonal projection onto S for (.,.). This justifies that such a 
kernel function K is called a projection kernel and that 

£ K = n 2 \\U s (f-g)\\ 2 . 

[Approximation kernel case] When X = R and v is the Lebesgue mea- 
sure, our second choice for K is a kernel function based on an approxima- 
tion kernel k in L 2 (R ), and such that k(—x) = k(x): for x = (x\, . . . , Xd), 
x' = (x[, . . . ,x' d ) in X, 

K(x, x') = k — — L -, ..., , 

nti^ V hi h d J 

where h = (hi, . . . , hd) is a vector of d positive bandwidths. Note that the 
assumption that k G L 2 (R rf ) together with Assumption 2 ensure that As- 
sumption 3 holds. Then, in this case, 

£ K = n 2 (k h *{f - g),f - g), 

1 U ( u ± u d 



where kh(ui, ■ ■ ■ ,Ud) = tts — j-k ( ^ L , . . . , jf j and * is the usual convolution 
operator with respect to the measure v. 

[Reproducing kernel case] Our third choice for if is a general reproducing 
kernel (see [47] for instance) such that 

K(x,x') = (e(x),e(x')) nK , 

where 6 and T-Lk are a representation function and a RKHS associated with 
K. Here, (., )u K denotes the scalar product of %k- We also choose K such 
that it satisfies Assumption 3. 

This choice leads to a test statistic close to the one of Gretton et al. 
[22] for the classical two-sample problem for i.i.d. samples of equal sizes. 
We will however see that the corresponding critical value is not constructed 
here in the same way as in [22]. While Gretton et al. derive their critical 
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value from either concentration inequalities, or asymptotic arguments, or an 
asymptotic Efron's bootstrap approach, we construct our critical value from 
a non-asymptotic wild bootstrap approach. 
In this case, it is easy to see that 

E K = n 2 \\nif -m g \\ UK , 

where mf = f^K(.,x)f(x)di' x and m g = J^K(.,x)g(x)di/ x . Note that in a 
"density" context where f^f(x)dv x = J^g(x)dv x = 1, Ek is n 2 times the 
so-called squared Maximum Mean Discrepancy on the unit ball in the RKHS 
%k (see [22]) between the distributions fdv and gdv, and that the functions 
nif and m g are known (see [51] for instance) as the mean embeddings in 
H.K of the distributions fdv and gdv respectively. Moreover, in this context, 
assuming that the kernel K is characteristic (see also [51]), the map which 
assigns its mean embedding in %k to any probability distribution is injective 
by definition, so Ek = if and only if / = g. 

We want to mention here that the introduction of reproducing kernels is 
particularly pertinent if the space X is unusual or pretty large with respect 
to the (mean) number of observations and/or if the measure v is not well 
specified or not easy to deal with. In such situations, the use of reproducing 
kernels may be the only possible way to compute a meaningful test (see [22] 
where such kernels are used for microarrays data and graphs). 

Thus, for each of the three above choices for K, considering a test which 
rejects (Hq) when Tk is "large enough" seems to be reasonable. It remains 
to explain what we mean by "large enough", that is to define the critical 
values used in our tests. 

2.2. Critical values based on a non- asymptotic wild bootstrap approach. 
The critical values we use here are based on a non-asymptotic wild boot- 
strap approach, that we present and justify in this section. To do this, we 
start from the remark that under (Hq), the test statistic Tk is a degenerate 
[/-statistic of order 2, for which adequate bootstrap methods were developed 
in particular in [6] and [1]. Bretagnolle [6] first noticed that a naive appli- 
cation of Efron's original bootstrap fails for degenerate [/-statistics, since it 
leads the bootstrapped statistic to lose the degeneracy property. He there- 
fore introduced the more appropriate m of n bootstrap, while Arcones and 
Gine [1] preferred to keep on using Efron's original bootstrap, but by forc- 
ing the bootstrapped statistic to satisfy the degeneracy property through 
a centering trick. The results of Arcones and Gine were then generalized 
to other kinds of bootstrap methods, and in particular Bayesian and wild 
bootstrapped [/-statistics were introduced in [25], [30] and [13]. 
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Following [13], we introduce a sequence (ej)jgN OI " i-i-d- Rademacher vari- 
ables independent of N. Denoting by N n the size of the pooled process N, 
and by {X±, . . . , -Xjv„} the points of N, a wild bootstrapped version of Tk 
may be expressed as X^i'eii jv > ^(-^ii^i') £ x £ x , £ i £ i'- We consider in 
fact the simpler version 

(2.4) f K = J2 K(Xi,Xff)eie if , 

i^i'e{i,...,N n } 

that can be proved to have, under (Ho), conditionally on N, the same dis- 
tribution as the above wild bootstrapped version of Tk- We now choose the 
quantile of the conditional distribution of T K given N as critical value for 
our test. 

More precisely, for a in (0,1), if q K (_ a denotes the (1 — a) quantile of 

the distribution of T K conditionally on N , we consider the test that rejects 
(Ho) when Tk > q K i_ a - The corresponding test function is defined by 

(2-5) $ K ,a = l t >0 W • 

Note that in practice, the true conditional quantile q K i is not exactly 
computed, but in fact just approximated by a classical Monte Carlo method. 

Of course, such bootstrap tests are not completely new in the statistical 
scene. However, the main particularities of our work is that we justify our 
test from a non-asymptotic point of view. We actually prove that under 
(Ho), conditionally on N, Tk and T K exactly have the same distribution. 
As a consequence the test defined by $K,a is of level a, that is it has a 
probability of first kind error at most equal to a. We will briefly see in the 
next section that it may even be randomized to be of size a, that is to have 
a probability of first kind error exactly equal to a. 

In the same way, instead of focusing as many previous authors on the 
consistence against some alternatives, we give precise conditions on the al- 
ternatives which guarantee that $x,a has a probability of second kind error 
controlled by a prescribed value (3 in (0, 1). These results are detailed in the 
next section. 

Furthermore, we do not forget that studying our tests from a non-asymptotic 
point of view poses the additional question of the exact loss in probabilities 
of first and second kind errors due to the Monte Carlo approximation of 
q K i- a - We also address this question in Section 2.4. 

Such a non-asymptotic approach is actually conceivable thanks to the 
following proposition, which can be deduced from a general result of [11], 
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but whose quite easy and complete proof is given in Section 4 for sake of 
understanding. 

Proposition 1. Let N 1 and N^ 1 be two independent Poisson processes 
on a metric space X with intensities f and g with respect to some measure 
/i on X and such that Assumption 1 is satisfied. Then the pooled process N 
whose point measure is given by dN = dN 1 + dN" 1 is a Poisson process on 
X with intensity f + g with respect to fi. Moreover, let (eg) N be defined 
by e® = 1 if x belongs to N 1 and eg = — 1 if x belongs to N" 1 . Then, 
conditionally on N, the variables (eg) N are i.i.d. and for every x in N, 

(2.6) P (eg = l\N) = -. { (X) . . , P (eg = -\\N) - 



f(x)+g(x)> " x ' > f(x) + g{xY 

with the convention that 0/0 = 1/2. 

2.3. Probabilities of first and second kind errors. We here study the prob- 
abilities of first and second kind errors of the test &K,a defined by (2.5). 

From Proposition 1, we deduce that under (Hq), Tk and Tf^ exactly have 
the same distribution conditionally on N. As a result, given a in (0,1), 
under (Ho), 



(2.7) P (f K > qW 



N) <a. 



By taking the expectation over N, we obtain that 

P(ff )(**,a = !)<<*■ 

In fact, the inequality (2.7) can be turned in an equality only for some partic- 
ular values of a, due to the discreteness of the conditional distribution of Tr 
given N. To go a little further, from Proposition 1, we deduce that the ran- 
domization hypothesis as defined by Romano and Wolf [45] and introduced 
by Hoeffding [23] is satisfied. From the construction of Hoeffding [23], one 
can therefore randomize <&K,a to obtain a test ^>K,a such that *f>K,a > &K,a 
a.s. and such that under (Ho), K(^x,a\N) = a for every a. Thus, by using 
the classical tool of randomization, one can circumvent the trouble due to 
the atoms of the discrete conditional distribution of Tk given N, and obtain 
a test with a probability of first kind error exactly equal to a for every a. 
Note that the randomized test ^ x,a necessarily has a probability of second 
kind error smaller than ^x.a's one, since ^> K,a > &K,a a-s. 

However, in practice, since the conditional quantile q K \ is approxi- 
mated by a Monte Carlo method as we have explained above, we do not 
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have access to the true randomized version of &K,a- This explains why we 
have decided to focus in the following on the non-randomized test &K,a- 

Given (3 in (0, 1), we now aim at bringing out a non-asymptotic condition 
on the alternative (/, g) which will guarantee that ^f t g(^K,a = 0) < /?. 
Denoting by q^ 1 _ g , 2 the (1 — /3/2) quantile of the conditional quantile 

yjf,l-a' 

Vf,g(*K,a = 0) < Ff, g (f K < <& tl _f, /2 ) + P/2. 

Thus, a condition which guarantees that ¥f tg (TK < q^ 1 „«/ 2 ) < /3/2 will be 
enough to ensure that Wf t g(&K,a = 0) < /3. The following proposition gives 
such a condition. 

Proposition 2. Leta,/3 be fixed levels in (0,1), and let us recall that for 
any symmetric kernel function K satisfying Assumption 3, E^ S [T^-] = £k> 
with £k given in (2.3). If 



(2.8) l K > 2nW + q Kil _p/ 2 , 

with A K = f x (K [f - g] (x)) 2 (f + g)(x)dv x , and B K = f %2 K 2 (x,x')(f + 
g)(x)(f + g)(x')dv x dv x i , then ff, g (T K < ^i_ /3 / 2 ) < /V 2 , so that 

Wf, g ($K,a = 0) < P. 
Moreover, there exists some constant k > such that, for every K , 



(2.9) g^_ W2 <dn(2/a)J^. 

To prove the first part of this result, we simply use Markov's inequality 
since obtaining precise constants and dependency in j3 is not crucial here (see 
Section 4). The control of q^_ x-an derives from a property of Rademacher 
chaoses combined with an exponential inequality (see [12] and [35]). 

The following theorem allows to better understand Proposition 2, and to 
deduce from it more recognizable properties in terms of uniform separation 
rates. 

Theorem 1. Let a, (3 be fixed levels in (0, 1). Let K be a symmetric ker- 
nel function satisfying Assumption 3, and &K,a be the test defined by (2.5). 
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Let Ck be an upper bound for J x2 K 2 {x,x')(f + g)(x)(f + g)(x')dv x dv x i . 
Then, we have Wf ig (&K,a = 0) < /3, as soon as 

(2.10) \\f-gf>int\\\(f-g)-r- 1 K[f-g]\\ 2 

r>0 L " 

A + 2V2Kln(2/a) /— i , 8||/ + 5 |oo 



"• rn vCk 

nry/p J pn 



+ 



For instance, Ck can be taken as follows. 

• Ck = IIZ + slloo-C when K is chosen as in the [Projection kernel case], 
considering an orthonormal basis {<f\, A £ A} 0/ a D-dimensional 
subspace S of L 2 (X, du) , 

• Ck = ||/ + 5 f ||oo||/ + 5 f ||i-D when K is chosen as in the [Projection ker- 
nel case], considering an orthonormal basis {<p\, A € A} of a possibly 
infinite dimensional subspace S o/L 2 (X, du), which satisfies: 

(2.11) sup ^2\ip x {x)ip\(x')\ =D< +00, 

(2.12) [ (y^\ Vx (x)<p x (x')\) (/ + g)(x')dv x dv xl < +00, 

• Cx = 1/ + g|oo|/ + S'llill&P/n^i ^i when K is chosen as in the 
[Approximation kernel case]. 

Comments. 

1. When K is chosen as in the [Projection kernel case], then K [f — g] = 
Hs(f — g)- Hence by taking r = 1 in (2.10), the right hand side of the in- 
equality reproduces a bias- variance decomposition close to the bias- variance 
decomposition for projection estimators, with a variance term of order y/D/n 
instead of D/n. This is quite usual for this kind of test (see [2] for instance), 
and we know that this leads to sharp upper bounds for the uniform separa- 
tion rates over particular classes of alternatives. 

2. When K is chosen as in the [Approximation kernel case] with k in 
L 1 ^), J Rd k{x)dv x = 1, and h\ = . . . = h d , then K [f - g] = k h * (/ - g), 
and ||(/ — g) — K [f — g] \\ is a bias term. Hence by taking r = 1 in the 
inequality (2.10), we still reproduce a bias- variance decomposition, but with 

J fry 

a variance term of order h 1 /n, which coincides with the above variance 
term in the [Projection kernel case] through the equivalence h^ ~ D. This 
equivalence is usual in the approximation estimation theory (see [52] for 
instance for more details). 
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3. When K is chosen as in the [Reproducing kernel case], if K is pro- 
portional to a kernel from the two above cases, then one can appropriately 
choose the constant r such that ||(/ — g) — r~ l K[f — g] \\ is still a bias 
term. We thus recover for such kernel functions, such as the Gaussian and 
Laplacian kernels, which are commonly used in statistical learning theory, 
the same bias- variance decomposition as above. However, in some cases, one 
can not find any normalization constant r for which | (/ — g) — r~ 1 K [f — g] \\ 
can be viewed as a bias term, and the result can not be interpreted from a 
statistical point of view. In these cases in particular, the L 2 -norm which is 
considered in Theorem 1 is not the appropriate one to obtain relevant uni- 
form separation rates, since it does not necessarily have any link with the 
norm of the RKHS rlx ■ We give in the following theorem a more adequate 
result for the specific [Reproducing kernel case]. 

Theorem 2. Let a,/3 be fixed levels in (0, 1), and n > be the constant 
of Proposition 2. Let X = M and K be a kernel function on X x X chosen 
as in the [Reproducing kernel case]. Let $K,a be the test function defined by 
(2.5). We assume furthermore that j^f(x)dv x = j^g(x)dv x = 1, that K 
is a bounded measurable characteristic kernel, and that K(x, x) is constant 
equal to k,q. Let nif and m g be the mean embeddings of the distributions fdv 
and gdv respectively in Hk- We have P f : g(&K,a = 0) < j3 if 

ll9 4k A 2 + Ky/2ln(2/a)\ 

\\mf — m a \\Z/^ > t: H f= • 

" ; 9mii ~ n \P V? J 

Comments. 

1. The assumption that K{x, x) is constant is usual, since it is satisfied by 
any normalized or translation-invariant kernel (see [47] p 46-47, 57, or [51] 
for instance). Moreover, as specified in [51] for instance, bounded continuous 
characteristic and translation-invariant reproducing kernels exist, at least in 
IR , where Bochner's theorem enables to characterize them. 

2. The result that we have here is in fact comparable to the one obtained 
by Wellner [53] for two-sample tests in an i.i.d. samples framework. While 
Wellner's test is based on the estimation of a weak distance between fdv 
and gdv, associated with the Sobolev norm with negative index, our test 
statistic is an unbiased estimator of £k = n 2 \\mf — m g W^ ,, where \\mf — 
rn g\\'HK = suPllrll-H <i fx(f ~ 9)( x ) r ( x )di / x defines a weak distance between 
the distributions fdv and gdv. As in [53] (or [20] beforehand for the problem 
of testing uniformity), we obtain a uniform separation rate for this weak 
distance of the same order as the usual parametric separation rate, that is 
of order n^ 1 ' 2 . 
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2.4. Performance of the Monte Carlo approximation. 

2.4.1. Probability of first kind error. In practice, a Monte Carlo method 
is used to approximate the conditional quantiles q K [_ a . It is therefore neces- 
sary to address the following question: what can we say about the probabil- 
ities of first and second kind errors of the test built with these Monte Carlo 
approximations? Recall that we consider the test &K,a rejecting (Hq) when 
Tk > Qk i_ q , where Tk is defined by (2.1), and q K £ is the (1 — a) quantile 
of Tj( defined by (2.4) conditionally on N. The conditional quantile q K [_ a 

is estimated by q K [_ a via the Monte Carlo method as follows. Conditionally 
on N, we consider a set of B independent sequences {e b , 1 < b < B}, where 
e b = {£ b x ) x( zN is a sequence of i.i.d. Rademacher random variables. We de- 
fine, for 1 < b < B, T K = J2 x ^ x ' eN K(x,x')e b x e b x ,. Under (H Q ), conditionally 

on N, the variables T K have the same distribution function as Tk, which 
is denoted by Fk- We denote by Fk,b the empirical distribution function 
(conditionally on N) of the sample (T K , 1 < b < B): 

1 B 
6=1 

Then, q { K}_ a is defined by q$J_ a = inf {t G R, F KfB (t) > 1 - a} . We finally 
consider the test given by 

(2.13) ^ = 1 f K >^l_ a - 

Proposition 3. Let a be some fixed level in (0, 1), and &K,a be the test 
defined by (2.13). Under (H ), 



($K,a 



1 



AT]<^J+ 1 



B + l 

Comment. For example, if B = 200 and a = 0.05, &K,a is of level 5.5%. 

2.4.2. Probability of second kind error. 

Proposition 4. Let a and f3 be fixed levels in (0, 1) such that as = 
a - y / hiB/(2B) > and Pb = P - 2/B > 0. Let &K,a be the test given in 
(2.13). Let £k, Ak, Bk and k as in Proposition 2, and let q K B 1 _ g /2 be the 
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(1-/3b/2) quantile of q^J-ag- U 



(2-14) £ K >2n^ 2nAK p +BK +qZ ^ B/2 , 

then Ff^ g (& K ^ a = 0) < (3. Moreover, 



(2.15) ^i-fe/2 ^ «ln(2/a B )nJ^. 

Comments. When comparing (2.14) and (2.15) with (2.8) and (2.9) in 
Proposition 2, we notice that they asymptotically coincide when B — > +oo. 
Moreover, if a = f3 = 0.05 and B > 6000, the multiplicative factor of 
nn^JBx is multiplied by a factor of order 1.2 in (2.15) compared with (2.9). 
If even B = 200000, this factor passes from 23.4 in (2.9) to 24.1 in (2.15). 

3. Multiple testing procedures. In the above section, we consider 
testing procedures based on a single kernel function K. Using such single 
tests however leads to the natural question of the choice of the kernel, and/or 
its parameters: the orthonormal family when K is a projection kernel, the 
vector of bandwidths h when K is based on an approximation kernel, the 
parameters of K when it is a reproducing kernel. Authors often choose par- 
ticular parameters regarding the performance properties that they target for 
their tests, or use a data-driven method to choose these parameters which 
is not always justified. For instance, in [22], the parameter of the kernel is 
chosen from a heuristic method. 

In order to avoid choosing particular kernels or parameters, we propose 
in this section to consider some collections of kernel functions instead of 
a single one, and to define multiple testing procedures by aggregating the 
corresponding single tests. We propose an adapted choice for the critical 
value. Then, we prove that these multiple tests satisfy strong statistical 
properties, such as oracle type properties and minimax adaptivity properties 
over many classes of alternatives. 

3.1. Description of the multiple testing procedures. Let us introduce a 
finite collection {K m ,m G A^} of symmetric kernel functions: XxX-^1 
satisfying Assumption 3. For every m in Ai, let Tx m and T|- be defined 
by (2.1) and (2.4) respectively, with K = K m , and let {w m ,m £ Ai} be a 
collection of positive numbers such that Ylm&M e ~ Wm ^ 1- F° r u in (0, 1), 
we denote by q m [_ u the (1 — u) quantile of T^ conditionally on the pooled 
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process N. Given a in (0, 1), we consider the test which rejects (Hq) when 
there exists at least one m in A\ such that 



T K m > q y ' (N) 

m.x— «„ e~ 



where u a is defined by 



(3.1) u£> = sup lu > 0,P sup T^ m - 9 ^ 



meA4 



,W 



>0 



N \ <a 



Let $ Q be the corresponding test function defined by 



(3.2) 



$, 



su PmGA4 T K m -q 



AN) 



(N) 



>0 



Note that given the pooled process N, uk. and the quantile q (N) 

can be estimated by a Monte Carlo method. 

It is quite straightforward to see that this test is of level a and that 
one can guarantee a probability of second kind error at most equal to (3 in 
(0, 1) if one can guarantee it for one of the single tests rejecting (Hq) when 
Tk > q (N) • We can thus combine the results of Theorem 1. 

m,l— U a 'e~ w m 

3.2. Oracle type conditions for the probability of second kind error. 

3.2.1. Multiple testing procedures based on projection kernels. 

Theorem 3. Let a,/3 be fixed levels in (0,1). Let {S m ,m £ Ai} be a 
finite collection of linear subspaces of L 2 (X, dv) and for all m in Ai, let 
{tp\,\ ^ Am} be an orthonormal basis of S m for (.,.). We assume either 
that S m has finite dimension D m or that the conditions (2.11) and (2.12) 
hold with A = A m and D = D m . We set, for all m in Ai, K m (x,x') = 
J2\eA ^Px^fxW)- Let <& a be the test defined by (3.2) with the collection 
of kernels {K m ,m S Ai} and a collection {w m ,m £ .M} of positive numbers 

such that J2meM e ~ Wm ^ L 

Then <£ Q is a level a test. Moreover, Ff t9 ($ Q = 0) < /3 if 



(3.3) \\f-gf> inf \\\(f-g)-U Sm (f 

m€M 



+ 't^WiHl^,^ + 8|/ + „| 



n 



VP 



pn 



where n > and M(f,g) = max |/ + gj^, y/\f + 5I00I/ + 9 
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Comments. Comparing this result with the one obtained in Theorem 1 
for the single test based on a projection kernel, one can see that considering 
the multiple testing procedure allows to obtain the infimum over all mm. M 
in the right hand side of (3.3) at the price of the additional term w m . This 
result can be viewed as an oracle type property: indeed, without knowing 
(/ — g), we know that the uniform separation rate of the aggregated test is 
of the same order as the smallest uniform separation rate in the collection of 
single tests, up to the factor w m . It will be used to prove that our multiple 
testing procedures are adaptive over various classes of alternatives. 

We focus here on two particular examples. The first example involves 
a nested collection of linear subspaces of L 2 ([0, 1]), as in model selection 
estimation approaches. In the second example, we consider a collection of 
one dimensional linear subspaces of L 2 ([0, 1]), and our testing procedure is 
hence related to a thresholding estimation approach. 

[Multiple kernels case - Example 1] Let X = [0, 1] and v be the Lebesgue 
measure on [0, 1]. Let {tpo, <P(j,k)-> J ^ N, k G {0, . . . ,2 J ' — 1}} be the Haar 
basis of L 2 ([0, 1]) with 

(3.4) <Po(x) = l [0>1] (x) and <p m (x) = W^Vx - fc), 

where ip(x) = l[oi/2)(a0 ~~ l[i/2,i)( 2; )- The collection of linear subspaces 
{S m ,m G Ai} is chosen as a collection of nested subspaces generated by 
subsets of the Haar basis. More precisely, we denote by So the subspace of 
L 2 ([0, 1]) generated by (po, and we define Kq{x,x') = ipo(x)ipo(x'). We also 
consider for J > 1 the subspaces Sj generated by {<p\, A G {0} U Aj} with 
Aj = {(j, k), j G {0, . . . , J - 1}, k G {0, . . . , 2? - 1}}, and Kj{x, x') = 
Sag{o}uA i P\( x ) l P\( x ')- Let for some J > 1, M. j = {J, < J < J}, and for 
every J in M j, wj = 2 (ln(J + 1) + \n(ir/VE)) . 

Let $a be the test defined by (3.2) with the collection of kernels {Kj, J G 
Mj} and with {wj, J G M j}. We obtain from Theorem 3 that there exists 

CK^I/IUHloo) > such that Ff, g (<S> { a ] = o) < /3 if 



(3.5) \\f-g\\ 2 >C(a,P,\floo,l9loo) mf 

JeMj 




+ (ln(J + 2))- 



[Multiple kernels case - Example 2] Let X = [0, 1] and v be the Lebesgue 
measure on [0, 1]. Let {ipo, tputyij G N, /c G {0, . . . , 2 J — 1}} still be the Haar 
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basis of L 2 ([0, 1]) defined by (3.4). Let for some J > 1, 

Aj = {(j,k), j£{0,...,J-l}, fc€{0,...,2>'-l}}. 
For any A in {0} U Aj, we consider the subspace S\ of L 2 ([0, 1]) generated 

(2) 

by (fx, and K\(x,x') = (p\(x)<p\(x'). Let <£« be the test defined by (3.2) 
with the collection of kernels {K\,\ £ {0} U Aj}, with wq = ln(2), and 
w {j)k) = ln(2») + 2 (ln(j + 1) + ln(vr/V3)) for j £ N, fc € {0, . . . , 2? - 1}. We 
obtain from Theorem 3 and Pythagoras' theorem that there is some constant 
C (a, (3, |/|oo! |<7 loo) > such that if there exists A in {0} U Aj for which 

||n5 A (/-p)f >C(a,/3, l/loo, Isloo)^, 

then F ft g (<S> { a ] = o) < (3. If Mj = {m, m C {0} U Aj}, the above condition 
is equivalent to saying that there exists m m Aij such that 

\\U Sm (f - g)f >C (a, /3,\\f UWglU ^^^ , 

n 

where S m is generated by {<p\,\ £ m}. Hence, there exists some constant 
Cta.&l/looJsloo) > such that F Lg U<£ ] = o) < /3 if 



(3.6) \\f -gf>C (a, (3,\\f lUWdtoc) inf {||(/ - g) - U Sm (f - g)f 



+ 

n 
3.2.2. Multiple testing procedures based on approximation kernels. 



}■ 



Theorem 4. Let a, [3 be fixed levels in (0, 1), X = R d and let v be the 
Lebesgue measure on R d . Let {k mi ,mi £ .Mi} be a collection of approxi- 
mation kernels such that f^k^ ni (x)du x < +oo, k mi (x) = k mi (—x), and a 
collection {h m2 ,m2 £ M2}, where each h m2 is a vector of d positive band- 
widths (/i m2i i, . . . , h m2t d)- We set M. = M.\ x Ai 2 , and for all m = (7711,7712) 
in M, x = (x±, . . . ,Xd), x' = (x' l5 . . . ,x' d ) in R. d , 

1 I X\— X-y Xd — x^ 



K m {x, x ) — fc mii ft, m2 [x x ) — . k mi I 

\i—^ ""mo.i V 



lli=l"m 2 ,i V n m 2 ,l n >m,2,d 

Let & a be the test defined by (3.2) with {K m ,m £ M} and a collection 
{w m ,m £ A4} of positive numbers such that Ylnn&M e~ Wm < 1- 
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Then & a is a level a test. Moreover, there exists k > such that if 

\\f- 9 f> inf h(f- g )-k muh *(f-g)f + 

(mi,m 2 )eX I- A 



4 + 2V2 K (ln(2/a) + w m ) \\\f + gUWf + g\\i\\k mi | 2 \ 8||/ + g\\ 



n\ 



then 

P/, s (*a = 0) < p. 

We focus here on two particular examples. The first example involves a 
collection of non necessarily integrable approximation kernels with a collec- 
tion of bandwidths vectors whose components are the same in every direc- 
tion. The second example involves a single integrable approximation kernel, 
but with a collection of bandwidths vectors whose components may differ 
according to every direction. 

[Multiple kernels case - Example 3] Let X = M. d and v be the Lebesgue 
measure on R d . We set Mi = N \ {0} and M% = N. For mi in Mi, let k mi 
be a kernel such that f k 2 ni (x)dv x < +00 and k mi (x) = k mi (—x), non nec- 
essarily integrable, whose Fourier transform is defined when k mi G L 1 (R d ) n 
h 2 (R d ) by jQ(«) = / Rd k mi (x)e^ x ^dv x and is extended to k mi G L 2 (E d ) in 
the Plancherel sense. We assume that for every mi in .Mi, |A: mi | 00 < +00, 
and 

(3-7) Ess sup ueK d\ {0} riW r 1 < C, 

\\ u \\d 

for some C > 0, where |u|<2 denotes the euclidean norm of u. Note that 
the sine kernel, the spline type kernel and Pinsker's kernel given in [52] for 
instance satisfy this condition which can be viewed as an extension of the 
integrability condition (see [52] p. 26-27 for more details). For 772-2 in M2, 
let h m2 = (2" m2 ,...,2- m2 ) and for m = (mi, ma) in M = Mi x M 2 , let 

tj- , i\ , / /\ _ , / x l ~ x l x d ~ x d 

K m {X, X ) - K mil h ma [X- X ) - 2 _ dm2 K mi I 2 _ m2 , • • • ) 2 -m 2 



We take w (mi , m2 ) = 2 (ln(mi(m 2 + 1)) + ln(^ 2 /6)), so E m eM e ' Wm ^ L 
Let &a be the test defined by (3.2) with the collection of kernels {K m , m G 
M} and {w m , m G M}. We obtain from Theorem 4 that there exists 
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,0) 



C(a, /3) > such that F f>g ( $ £' = ) < (3 if 



(3.8) \\f-g\\ 2 >C(a,[3)[ inf '^ ||(/ - g) - k muhm2 * (f - g)\ 2 



w (mi,m 2 ) / ||/ + g||oo||/ + g|l|fcm 1 | 2 [ 1/ + fffoo \ 

n V 2~ dm 2 J " n J' 

[Multiple kernels case - Example 4] Let X = M. d and v be the Lebesgue 
measure on R d . Let M\ = {1} and M2 = N d . For x = (xi,...,Xd) in 
R d , let k\(x) = Yli=i ki,i(xi) where the ki/s are real valued kernels such 
that ki : i G L X (]R) n L 2 (IR), k\^{xi) = fci,j(— a?i), and j^k\^{xi)dxi = 1. For 
m 2 = (m 2 ,i, . . . ,m 2j( i) in M2, h m2ji = 2~ m2 - 1 and for m = (mi,m 2 ) in 
M = Mi x M 2 , 

K m (x, x') = k rnuhm2 - x') = Y[ t h,i ( -7 " ) ■ 

We also set W(i jm2 ) = 2j2i=i ( m ("i2,i + 1) + hi(7r/\/6)), so that 

Y^m.eM 1 xM 2 e ~ Wm = 1- Let $ °^ 1:,e the test defined by (3.2) with the collec- 
tions {K m ,m G Ai} and {w m ,m G .M}. We deduce from Theorem 4 that 

there exists C(a, /3) > such that P /jS (^ = o) < j3 if 



(3.9) 1/ - 5 || 2 > C(a,/3) inf ||(/ - g) - fc 1)hm * (/ 

\ m2GA42 



+ W (Lm 2 ) / ||/ + g||oo||/ + g|l|fcl| 2 I + ||/ + g|| 



n V ntiW j y 

3.3. Uniform separation rates over various classes of alternatives. We 
here evaluate the uniform separation rates, defined by (1.1), of the multiple 
testing procedures introduced above over several classes of alternatives based 
on Besov and weak Besov bodies when X = [0, 1], or Sobolev and anisotropic 
Besov-Nikol'skii balls when X = R d . 

3.3.1. Uniform separation rates for Besov and weak Besov bodies. In this 
section, we adapt to the present setting the results that we obtained in [18]. 

Given a in (0, 1), let $ L and & a L be the tests defined in [Multiple 
kernels case - Example 1] and [Multiple kernels case - Example 2] (with a 
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replaced by a/2), and let ® a = max($^L, $^ 2 L). 

Recall that these tests are constructed from the Haar basis \jfQif(j^)ii £ 
N, k e {0, . . . , 2? - 1}} of L 2 ([0, 1]) defined by (3.4). We define for 's > 0, 
R > the Besov body B 2 ^(R) as follows: 

{2-J-l 
s = a ip + Y^ Yl a {hk)V{j,k) J «o < r2 i v i G N, 
jeN fc=o 

2^-1 ^1 

?2 9 -2j5 



E «&» * ^ 2 " 



fc=0 J 

We also consider the weak Besov body given for 7 > 0, R' > by 

{2J-1 
s = a ^o + E E a (j,fc)^(i,fc) / 
ieN fe=o 

2J-1 "I 

Vi > 0, agl^< t + J2J2 a U,k)Kf jk) <t < R ,2 t^ V 

Corollary 1. Assume that lnlnn > 1, 2 J > n 2 , and J = +00. Then, 
for any 5 > 0, 7 > 0, i2, i?', i2" > 0, if 

Bs^MR'iR") = {(f,g) J (f-g)e s£ |0O (i?) n W 7 (R% 

m a x(\\f\\ooA\g\\oo)<R"}, 



p(^ a ,Bs^ : oo{R, R',R"),/3), defined by (1.1), is upper bounded by 

2S 

(») C(6, 7, fl, Bf, R", a, /3) (Ja^2) «+* if 5 > 7/2, 

(it) C(5, 7 ,R,R',R", a ,/3){ 1 -^)^ ifb< 1 l2. 

Comments. 

1. Lower bounds for the minimax separation rates over Bs^^iR, R' , R") 
are also available, proving that the test \& a is adaptive in the minimax sense 
over S<5 j7j00 (.R, R' , R"), up to a lnlnn factor if 5 > max (7/2, 7/(1 + 27)) 
and exactly if 5 < 7/2 and 7 > 1/2. In the other cases, the exact rate is 
unknown. 

2. Let us mention here that our classes of alternatives are not defined in 
the same way as in [7] in the classical two-sample problem for i.i.d. samples, 
since the classes of alternatives (f,g) of [7] are such that / and g both 
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belong to a Besov ball. Here the smoothness condition is only required on 
the difference (/ — g). In particular, the functions / and g might be very 
irregular but as long as their difference is smooth, the probability of second 
kind error of the test will be controlled. 

3.3.2. Uniform separation rates for Sobolev and anisotropic NikoVskii- 
(3) 

Besov balls. Let & a be defined as in [Multiple kernels case - Example 3], 
and let us introduce for 5 > the Sobolev ball S%(R) defined by 

S s d (R) = JsiM^M/seL^lR^nL 2 ^), / \\u\\f\s{u)\ 2 du < {2ir) d R 2 I. 



where \\u\\d denotes the euclidean norm of u and s denotes the Fourier trans- 
form of s: s(u) = J Rd s(x)e l ( x ' u 'dx. 

Corollary 2. Assume that lnlnn > 1. For any 5,R,R',R" > 0, if 

S s d (R,R>,R") = {(f,g) /(f- g )eS s d (R), maxd/li, |M|i) < R\ 

maxd/IUHU^"}, 

then 



p(^\S s d (R,R',R"),(3) < C(5,a,(3,R,R',R",d) 
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In In n \ d + 4S 



) r 



Comments. From [42], we know that, in the density model, the minimax 
adaptive estimation rate over S d (R) is of order n d + 2S when 5 > d/2. Rigol- 
let and Tsybakov construct some aggregated density estimators, based on 
Pinsker's kernel, that achieve this rate with exact constants. In the same 

(3) 
way, the test $« consists in an aggregation of some tests based on a collec- 
tion of kernels, that may be for instance a collection of Pinsker's kernels. It 
achieves over S d (R, R , R ) a uniform separation rate of order n d + 4S up to 
a In In n factor. This rate is now known to be the optimal adaptive minimax 
rate of testing when d = 1 in several models (see [49] in a Gaussian model 
or [27] in the density model for instance). From the results of [24], we can 
conjecture that our rates are also optimal when d > 1. 

Let $« be the test defined in [Multiple kernels case - Example 4]- Let A = 
(Ai, . . . , Arf), where for every i = 1 . . . d, Aj is a positive integer. Assume 
furthermore that f R \ki^(xi)\\xi\ { dxi < +oo, and f m k\ t i(xi)xldxi = for 
every i = 1 . . . d and j = 1 . . . Aj. 
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For 6 = (S±, . . . , S d ) € rL=i(0> Aj] and R > 0, we consider the anisotropic 
Nikol'skii-Besov ball N% 4 (R) defined by: 

J\f 2 d (R) = Is : R — >• R / s has continuous partial derivatives D\ ^ 
of order [5i\ w.r.t Uj, and Vi = 1 . . . d, ui, . . . , u<j, w£R, 
\\D\ 5 ^s( Ul , ..., Ui + v,...,u d )- D\ Sii s( Ul , . . .,u d )\\ 2 < R\v\ 5 *-^ }. 

Corollary 3. Assume that lnlnn > 1. For any 5 = (Si, . . . ,8 d ) in 
nti(°> A d and R , #> ^" > °; */ 



ML(R, R', R") = {(/, g) I (f-g)e M$AR), max 



max 



then, for 1/5 = Y,%=\ 1 / S i> 

p(^ ) M d (R,R',R"),P)<C(S,aJ,R,R',R",d) 



i,\\g\\i)<R', 
oo, \\g\\oo) < R }, 



2S 

lnlnn\ n- 4 ^ 



ti 



Comments. When d = 1, from [27], we know that in the density model, 
the adaptive minimax rate of testing over a Nikol'skii class with smoothness 
parameter 5 is of order (Inlnn/n) '( 1+ ' . We find here an upper bound 
similar to this univariate rate, but where 5 is replaced by d. Such results 
were obtained in a multivariate density estimation context in [21] where the 
adaptive minimax estimation rates over the anisotropic Nikol'skii classes 
are proved to be of order ra '' 1+ \ and where adaptive kernel density 
estimators are proposed. Moreover, the minimax rates of testing obtained 
recently in [29] over anisotropic periodic Sobolev balls, but in the Gaussian 
white noise model, are of the same order as the upper bounds obtained here. 

4. Proofs. 

4.1. Proof of Proposition 1. All along the proof, f denotes J x . Recalling 
that the marked point processes are characterized by their Laplace functional 
(see [11] for instance), we first aim at computing E [exp (J hdN)) for any 
bounded measurable function h on X. Since iV 1 and iV -1 are independent, 



E 



exp 



hdN 



E 



exp 



hdN 1 



E 



exp 



hdN' 1 
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The Laplace functional of iV 1 is given by 
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E 



exp ( f hdN 1 ) = exp ( f (e h - l) fdfA 



and the Laplace functional of N 1 has the same form, replacing / by g, so 



E 



exp / hdN 



exp(J(e h -l)(f + g)di^j 



which is the Laplace functional of a Poisson process with intensity (/ + g) 
w.r.t. fj,. Therefore, N is a Poisson process with intensity (/ + g) w.r.t. \i. 
In order to prove (2.6), we then give an explicit expression of the function: 



t=(t x ) xeN ^$(t,N)=*E 



exp Yl tx£ x 

\x<£N J 



N 



which characterizes the distribution of (e x ) x& N conditionally on N. 
Let A be a bounded measurable function defined on X, and let 



E A = E 



exp I / XdN J exp \, tx 

^ ' VxGiV 



4 



E x = E 

= E 



exp 



e tx f(x)+e~ tx g(x) 



By definition of (e x ) X £N and by independency of N 1 and N , we have that 
exp ( f(X(x) + t x )dN*\ exp ( f (X(x) - t x )dN~A 
exp ( / (X(x) + t x )dNl J E exp { / (X(x) - t x )dN~ l 

\ e m+t x _ 1)fix) + {e x( X )-t x _ l)g{x) -\ dllx 

Then, for h(x) = X(x) + In 

E A = expj{e h ^ - !)(/ + g)(x)dfi x = E exp (J 



hdN 



Hence, for every bounded measurable function A defined on X, 



E 



exp [ / XdN j exp 1 ^ t x e x J 


= 


E 


exp ( f XdNj ] 

^ ' xGN 


{ (f + g)(x) 



+ e 



g{x) 



(f + g)(x) 
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Since the marked point processes are characterized by their Laplace func- 
tional, this implies that 



$(t,N) = E 



exp J2 ^ £ x 



\x£N 

which concludes the proof. 



N 



n 

x£N 



c u . IM_ | c -t x a{x) 



(f + g)(x) tf + g){x))' 



4.2. Proof of Proposition 2. Let us prove the first part of Proposition 2. 
Recall that Q^ 1-0/2 denotes the 1 — /3/2 quantile of q K (_ a , which is the 
(1 — a) quantile of Tf^ conditionally on N. We here want to find a condition 
on Tk, or more precisely on Ek = E/ )fl [T|<-], ensuring that 

P/, ff (fk < q a K ^ /2 ) < /3/2. 
From Markov's inequality, we have that for any x > 0, 

Var(fk) 



f,9 



-Tk + £k 



> X ) < 



Let us compute Var(f K ) = E /)9 [f|] - £ 2 K . Let X® and XW be the sets 
{(x, y, u) G X 3 , x, y, u all different} and {(x, y, u, v) G X 4 , x, y, u, v all different} 
respectively. Since 



by using (2.6), 



E 



K{x,3?)elel,dN x dN x 



N 



K(x, y)K(u, v)——(x)——(y) 
] / + 5 f + 9 



f -T^- («) TT^ {v)dN x dN y dN u dN v 
f + 9 f + 9 



K{x,y)K{x,u) f -—^{ y ) f -—^-{u)dN x dN y dN u 

f+9 f+9 



+2E 



/■</ 



K 2 (x,y)dN x dN y 
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Now, from Lemma 5.4 III in [11] on factorial moments measures applied to 
Poisson processes, we deduce that 

E /, g ^l] = / [K(x,y)K(u,v)(f-g)(x)(f-g)(y) 
jx 4 y 

(/ - g)(u)(f - g)(v) I dn x dfi y dfi u dfi v 

+4 / K(x,y)K(x,u)(f + g)(x)(f - g)(y)(f - g)(u)dfjL x dfjL y diJ,u 



+2 / K\x,y)(f + g)(x){f + g){y)d^ x d^ y 

Jx 2 



Note that the three above integrals are finite, thanks to Assumptions 1, 2 et 
3. We finally obtain that E/.^Tf] = £ 2 K + An 3 A K + 2n 2 B K , and for x > 0, 



7,9 



-Tk + £k 



> x < 



4n 3 A K + 2n 2 J B A - 
x 2 



Taking x = 2ny/(2nAx + Bk)/ P in the above inequality leads to 
(4-1) P /lfl 



-TV + f jf 



>2nJ^k±^ <£. 



/3 / " 2 



Therefore, if ^ > 2n^ 2nA «+ Bl < + g^,^, then P /ifl (f* < q a Kyl _ N2 ) < 
/3/2, so F f , g ($K,a = 0) < /3. 

Let us now give a sharp upper bound for (l'xi_a/2- Reasoning conditionally 
on N, we recognize in Tf^ a homogeneous Rademacher chaos, as defined 
by de la Peha and Gine [12], of the form X = X^i' x i,i' E i E v-, where the 
Xjj/'s are some real deterministic numbers and (ei)ieN is a sequence of i.i.d. 
Rademacher variables. Corollary 3.2.6 of [12] states that there exists some 
absolute constant k > such that if a 2 = ELY 2 ] = X)i=£j' x \vi then 

E[exp(|X|/(K<r))] < 2. 

Hence by Markov's inequality, 

P(|X| > Kcrln(2/a)) < a. 
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Note that one could find more precise constants with the results of [35]. 
Applying this result to T^ with a 2 = Ylxjtx'eN A 2 (x,x') leads to 



q^l-ec < Kln(2/a)J / K 2 (x,y)dN x dN y . 
Hence q'£ cl _a, 2 is upper bounded by the (1 — j3/2) quantile of 



«bi(2/a)^/ x[2] K 2 (x,y)dN x dN y . 

Using Markov's inequality again and Lemma 5.4 III in [11], we obtain that 



/ K 2 (x,y)dN x dN y > 



2n 2 B K \ < 



V^Xl 2 ! P J * 

and 

/9 R 

gjr,i_/9/2 ^ K\n(2/a)nJ—g-. 

4.3. Proof of Theorem 1. First notice that for every r > 0, and every 
kernel function X satisfying Assumption 3, 

^ = ^(||/-5l| 2 + r- 2 ||K[/-< 7 ]| 2 -|(/- 5 )-r- 1 K[/- 5 ]| 2 ). 

With the notations of Proposition 2, let Cx be any upper bound for Bk- 
Since ^4^- < \\K [f — g] \\ 2 \\f + g\\oo, from Proposition 2, we deduce that 

P/, ff (**,a = 0) < /3 if 



||/- 5 || 2 +r- 2 |^[/- 5 ]| 2 -|(/- 5 )-r- 1 A'[/-< 7 ] 



2 



^ 2||/ + g| 00 |A[/-g]| 2 / /- /2\\ /— 

By using the elementary inequality 2a6 < a 2 + b 2 with a = || K [f — g] \\ jr 
and b = 2y/2y/\\f + g\\oc/{n(3) in the right hand side of the above condition, 
this condition can be replaced by: 

\\f-9\\ 2 >Kf-9)-r- 1 K[f-g]\\ 2 + 8lf + 9l ° 



nj3 

+db( 2+ " / Ki))^ 

We can even add an infimum over r in the right hand side of the condition, 
since r can be arbitrarily chosen. Let us now justify our choices for Ck- 
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[Projection kernel case] We consider an orthonormal basis {<p\, A £ A} 
of a subspace S of L 2 (X, di/) and K(x,x') = J2\eA i Px(x)<Px(x'). When the 
dimension of S is finite, equal to D, 

B K < \\f + g\\lo [ y^^( x )^( x ') 1 dM*v 

Jx Vaga / 

< IIZ + sll^- 
When the dimension of S is infinite, 

Bk = [y^VxitfVxi?')} (/ + g){ x )U + g){x')dv x dv xl 

J ^ 2 \xTa J 

< \\f + g\\oo 1^2 (p\(x)<px(x') \ (/ + g)(x')du x du x > 

Jx2 \AeA / 

< \\f + g\\oo / Y] <p\(x)<p\(x')<p\i(x)<px'(x') \ (f + g)(x')dv x dv x > 

7x2 \a,a^a / 

< If + g\\oo Yl / ( Px( x ) ( Px( x ) dl 'x / v 3 A(a; / )^A'( 2;/ )(/ + 5')(s / )^'> 

a,a'ga^ x ^ x 

where we have used the assumption (2.12) to invert the sum and the in- 
tegral. Hence we have, by orthogonality, and since by assumption (2.11), 
£ A6A ^(z)<A 

Bk < Wf + gWocY] f AWtf + g){x')dv x , 
XeA Jx 

< Wf + g\U\f + g\\iD. 

[Approximation kernel case] Assume now that X = R and introduce 
an approximation kernel such that J k 2 (x)dv x < +oo and k(—x) = k(x), 
h = (hi, . . . ,hd), with hi > for every i, and K(x,x') = kh(x — x'), with 



k h (xi,...,x d ) = / h . k (}£,■■ ■,%%)■ In this case, 



Bk = I k 2 h (x - x')(f + g)(x)(f + g)(x')dv x dv x , 
< \\f + g\\oo / kf,(x-x')(f + g)(x)dv x du x >, 



< \\f + g\U\f + gh\\k\\ 2 

nti ^ 

This ends the proof of Theorem 1. 



:-!() 



M. FROMONT ET AL. 



4.4. Proof of Theorem 2. We first recall that when K is chosen as in 
the [Reproducing kernel case], under the assumptions of Theorem 2, £k = 



n Wrrit — m„ L, 



(see Section 2.1). 



Since A K = J x (J x 9(x)(f - g)(x)dv x ,9(y)) HK (f+g)(y)dis y , by the Cauchy- 
Schwarz inequality for the norm ||.||% K in the RKHS, we obtain: 



A K < 



9(x)(f - g)(x)dv a 



H K 



Uk 



(f ' + g)(y)dv y . 



Now, since for every y in X, 



n K = K (y>y) = K o, 



A 



K 



< 



'Ml 



9(x)(f - g)(x)dv a 



\\f + g\\ 



Hi 



< n \\m f -mgWu \\f + g\\ 



This leads to 



2nA K / 2KoTO||/ + g ||i n 
V^ - V ~P W m f ~ m 9\\uK 



n 
< — \\m 
~ 2 



f-m g \\ HK +4 



Konjf + g\\ 



Finally, noting that Bk < Kq|/ + g\\ and that by assumption |/ + g\\ = 2, 
we obtain the desired result from Proposition 2 and obvious calculations. 

4.5. Proof of Proposition 3. First let us rewrite here a result due to 
Romano and Wolf [45]. 



Lemma 1. Let Yq,...,Yb be B + 1 exchangeable variables then for all 



(^i( i+ H 



< u \ < u. 



Assume that (Hq) is satisfied. Conditionally on N, the observed statistic 
Tfc := Tk has the same distribution and is independent of the TJ, 's for 
6 = 1, ..., B. Therefore the variables T|- 's for 6 = 0, ..., B are exchangeable 
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variables given N. Hence applying Lemma 1, we obtain: 
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*z= 


= 1 n) -- 


- p(fe>C ) -„ 


») 








\6=1 


< [Ba\ 



N 



^s+i^ + E 1 ^^] 



< 



[Ba\ + 1 



B + l 



JV 



< 



|£a] + 1 
B + l ' 



;(iV) 



4.6. Proof of Proposition 4- Let t = q K B 1 _ , 2 - By definition of q K [_ a 



We have 



(E 1 ^<t< B ( 1 -«)j 



7,9 



[ll 1 fi<t <B ^- a ^ F K{t)>l-Oi B \ 



B 



^ f , 9 E %<* 



,6=1 



Fjf(*)J <B(l-a)-J3(l-a B ) . 
So we can decompose as follows: 

P /, 9 (Cl-a > *) < P fA F K(t) <l~a B ) 

/ B 



v6=l 



+ p /-* E Cw^-wo <- s 



In 5 
~2iF 



By Hoeffding's inequality applied to the second probability given N, we 
obtain: 



m 



1 



Vf,M]i- a >*)< Vf, 9 ( F K(t) <l-a B ) + -. 



D 



But by definition of t, this becomes 



P/^gL, > *) < ? 
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Let us now control the probability of second kind error of the test &K,a- 
P/rfCk < <S- J < V f , g {TK < <©-*> &-a < *) + P/^«gJ-a > *> 

< Ff, g (T K <t) + P/2. 
We deduce from (4.1) that if 



£ K > 2m/ — \-t, 

then ¥f tg (f K < t) < 0/2, and P /)fl ($K,a = 0\ <P- An upper bound for t 
is finally derived from (2.9), which concludes the proof. 
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5. Supplementary materials. 

5.1. Simulation study. 

5.1.1. Presentation of the simulation study. In this section, we study our 
testing procedures from a practical point of view. We consider X = [0, 1] or 
X = R, n = 100 and v the Lebesgue measure on X. N 1 and TV -1 denote two 
independent Poisson processes with intensities / and g on X with respect to 
(j, with dfi = lOOdv. We focus on several couples of intensities (/, g) defined 
on X and such that J* x f{x)dv x = J^g{x)dv x = 1. We choose a = 0.05. 
Conditionally on the number of points of both processes N 1 and N , the 
points of N 1 and iV -1 form two independent samples of i.i.d. variables with 
densities / and g with respect to v. Hence, conditionally on the number 
of points of iV 1 and N , any test for the classical two-sample problem 
for i.i.d. samples can be used here. We compare our tests to the condi- 
tional Kolmogorov-Smirnov test. Thus we consider five testing procedures, 
that we respectively denote by KS, Ne, Th, G, E. The testing procedure 
KS corresponds to the conditional Kolmogorov-Smirnov test. The testing 
procedures Ne and Th respectively correspond to § a and & a defined 
in [Multiple kernels case - Example 1] and [Multiple kernels case - Ex- 
ample 2] with J = 7 and J = 6. The testing procedures G and E are 
similar to the test 3>q defined in [Multiple kernels case - Example 4]- 
For G, we consider the standard Gaussian approximation kernel defined 
by k(x) = (27T)" 1 ' 2 exp(— x 2 /2) for all x G R and for E, we consider the 
Epanechnikov approximation kernel defined by k(x) = (3/4) (1 — x )1m<i- 
For both tests, we take {h m , m G M] = {1/24, 1/16, 1/12, 1/8, 1/4, 1/2} and 
the corresponding collection of kernels {K m ,m G Ai} given for all m in Ai 

by K m (x, x') = j^k (^f-j • We also take for both tests w m = 1/\M\ = 1/6. 

Let us recall that our tests reject (Hq) when there exists m in Ai such 

that TV > q (N) where iV is the pooled process obtained from 

m,l—Ua ' e~ Wm 

N l and iV -1 , and u a is defined by (3.1). Hence, for each observation of 
the process N whose number of points is denoted by iV n , we have to esti- 
mate u a and the quantiles q (N) . These estimations are done by 

m,l— u a 'e~ w m 

classical Monte Carlo methods based on the simulation of 400000 indepen- 
dent samples of size N n of i.i.d. Rademacher variables (see Section 2.4 for 
the theoretical study of these Monte Carlo methods when single tests are 
considered) . Half of the samples is used to estimate the distribution of each 
Tjf . The other half is used to approximate the conditional probabilities oc- 
curring in (3.1). The approximation of u a is obtained by dichotomy, such 



:->>(. 
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that the estimated conditional probability occurring in (3.1) is less than a, 
but as close as possible to a. By monotony arguments, this is equivalent 
to make u varying on a regular grid of [0, 1] with bandwidth 2~ 16 , and to 

choose the approximation of u a as the largest value of the it's on the grid 
such that the estimated conditional probabilities in (3.1) are less than a. 

5.1.2. Simulation results. We first study the probability of first kind er- 
ror of each test for three common intensities. The first one is the uniform 
density on [0, 1], the second one is the Beta density with parameters (2, 5), 
and the third one is a Laplace density with parameter 7. Let 



fi(x) 

/2,2,50c) 

fsj(x) 



![o,i](a;), 
s(l- 



Z e -r|x- 



l dx 



ilo.iiOc), 



1/2| 



Taking / as one of these three functions, we realize 5000 simulations of two 
independent Poisson processes iV 1 and N~ 1 both with intensity / w.r.t. to 
fi. For each simulation, we determine the conclusions of the tests KS, Ne, 
Th, G and E, where the critical values of our four last tests are approximated 
by the Monte Carlo methods described above. The probabilities of first kind 
error of the tests are estimated by the number of rejections for these tests 
divided by 5000. The results are given in the following table: 



/ 


KS 


Ne 


Th 


G 


E 


h 


0.053 


0.049 


0.045 


0.053 


0.053 


/2,2,5 


0.053 


0.047 


0.043 


0.051 


0.050 


h,7 


0.0422 


0.0492 


0.0438 


0.054 


0.055 



We then study the probability of second kind error of each test, or more 
precisely the power of each test, for several alternatives. We consider al- 
ternative intensities (/, g) such that f = fi and g is successively equal to 
intensities that are classical examples in wavelet settings, and are defined by 

9l,a,e(x) = (l+e)l[ Ol a)0*0 + (1 ~ z)\a,2a) 0*0 + l[2o,l) 0*0, 



■> hj , , s \ lfo li 0*0 

92,r 1 {X) ■■ | 1 + ?? 2^ g i 1 + S £ n ( X ~ Pj)) 



c 2 (v) 



g 3>£ (x) = (l-e)l [0jl ](x)+e \J2 9 A 1 + 



x 



Pj\ 



40,1 



(x) 



0.284 
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where p, h, g, w, e are defined as in [18] , < e < 1, < a < 1/2, r\ > and 
Ciiy]) is such that f Q g2,rj(x)dx = 1. We also consider alternative intensities 
(/, g) such that / is equal to the above Laplace density fy 7 with parameter 
7, or to the Laplace density f^io with parameter 10, such that /3,io(x) = 
5e~ 10 i x ~ 1 ' 2 ' , and g = #4,1/2,1/4 is the density of a Gaussian variable with 
expectation 1/2 and standard deviation 1/4. 

For each alternative (/,<?), we realize 1000 simulations of two independent 
Poisson processes N 1 and iV _1 with respective intensities / and g w.r.t. \i. 
For each simulation, we determine the conclusions of the tests KS, Ne, Th, 
G and E, where the critical values of our four last tests are still approxi- 
mated by the Monte Carlo methods described above. The powers of the tests 
are estimated by the number of rejections divided by 1000. The results are 
summarized in Figures 1 and 2 where in each column, the estimated power 
is represented as a dot for every test. The triangles represent the upper and 
lower bounds of an asymptotic confidence interval with confidence level 99%, 
with variance estimation. 



G E KS Ne 



G E KS Ne 



G E KS Ne 



KS Ne Th G E KS Ne 



G E KS 



Th G E 



Figure 1. Left: (f,g) = (fi,gi,a, £ )- Each column corresponds respectively to (a,e) = 
(1/4,0.7), (1/4,0.9), (1/4,1) and (1/8,1). Right: (f,g) = (fi,g2, v ). Each column corre- 
sponds respectively to T) = 4, 8 and 15. 



1 

p= 


( 


0.1 


0.13 


0.15 


0.23 


0.25 


0.4 


0.44 


0.65 


0.76 


0.78 


0.81 


h= 


( 


4 


-4 


3 


-3 


5 


-5 


2 


4 


-4 


2 


-3 


g= 


( 


4 


5 


3 


4 


5 


4.2 


2.1 


4.3 


3.1 


5.1 


4.2 



) 
) 

w= ( 0.005 0.005 0.006 0.01 0.01 0.03 0.01 0.01 0.005 0.008 0.005 ) 
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Figure 2. Left: (f,g) — (fi,gs,e)- The two columns correspond respectively to e = 0.5 and 
1. Right: (/,<?) = (/3, a, 54,1/2,1/4)- The two columns correspond respectively to X = 7 and 
\= 10. 



In all cases, the tests G and E based on approximation kernels are more 
powerful (sometimes even about 4 times more powerful) than the KS test. 
This is also the case for the test Ne, except for the last example. The test Th 
is more powerful than the KS test for the alternatives (/, g) = (fi,gi, a ,e), 
but it fails to improve the KS test for the other alternatives. We conjecture 
that the test Th consists in the aggregation of too many single tests. We 
can finally notice that the test E strongly performs for every considered 
alternative, except in a sparse case, where the test E is less powerful than the 
test Th (see Figure 1). Our conclusion is that the test E is a good practical 
choice, except maybe when sparse processes are involved. Aggregating the 
tests E and Th in such cases would probably be a good compromise. 

5.2. Proof of Theorem 3 and Theorem 4- It is clear from the definition 
of u a that the test defined by <£ Q is of level a. Obviously, by Bonferonni's 
inequality, u a > a, hence, setting a m = ae~ Wm , we have 

AN) 



> f , g [3meM,T Km>q , 



■V f , g [3meM,f Km > q W 



(N) 



>l-^f, g [imeM,T Km <q^> 



TTli- 1 - L*m 



m;-L "m 



>1- irfP/rfter m <«£!,! 
> 1-/3, 

as soon as there exists in in M. such that Py ;g f lV m < q K \ 



< (3. We 
can now apply Theorem 1, replacing ln(2/a) by (ln(2/a) + w m ), to conclude 
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the proof. 

5.3. Proof of Corollary 1. Let us first find an upper bound for 
p(&a ,^,5 i7itX )(i?, R',R"),/3). Considering (3.5), we in fact only need to find 
a sharp upper bound for the right hand side of the inequality when (/, g) be- 
longs to Bs t y t00 (R, R' , R"). So let us assume here that (/, g) G B$ t y t00 (R, R' , R"). 
Then (/ — g) G B\ oo(R), and it is well known (see [18] for instance) that in 
this case, 

||(/ - g) -U Sj (f ~ 9)f < C(S)R 2 2- 2J5 . 

Since the constant C(a,f3, |/|oo) IMIoo) in (3.5) can be upper bounded by a 
constant C(a, f3,R"), the right hand side of (3.5) can be upper bounded by 

f 2 J / 2 1 

C(a,P,5,R,R") inf <^ 2~ 2JS + (ln(J + 2))- 



Now, taking 



J* 



J&Mj n 



2_ 

n \ 45+1 

log. 



In Inn 



f 2 J / 2 

C{a,P,5,R,R") inf I 2~ 2JS + (ln(J + 2))- 



< C(a, 0, 5, R, R") { 2~ 2J d + (ln( J* + 2)) 



n 

2 J */2 



n 



<C(a,p,5,R,R") 



n 



In Inn 



2<5 
45+1 



This leads to 

2(5 

p($W,B St7t00 (R,R',R"),P) < C(a,/3,6,R,R") (j^J " +1 • 
Of course a similar upper bound applies to ^/ a - 

(2) 

Let us now find an upper bound for p(<& a ,Bs^ jOC (R,R ,R"), ft). Consid- 
ering (3.6), we only need to find an upper bound for the right hand side 
of the inequality when (/, g) belongs to Bs > j t00 (R,R',R"). Let J be an in- 
teger that will be chosen later. As in [18], for any m C Aj = {(j,k),j G 
{0, . . . , J — 1}, k G {0, . . . , 2 J ' — 1}}, one can write 

\\(f-g)-n Sm (f-9)\\ 2 = \\(f-g)-nsAf-9)\\ 2 +\\ns m (f-g)-n s Af-9)\\ 2 . 
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Let us define the coefficients: a\ = (f — g, (f\) for every A in {0}uAj, and let 
us consider m such that {a\, A G m} is the set of the D largest coefficients 
among {a\, A G {0} U Aj}. From [18] p. 36 for instance, we deduce that 

|ns m (/ - g) - UsAf - 9)f < C{i)H^D- 2 \ 

As above, we also have that 

||(/ - g) -U Sj (f ~ 9)f <C(S)R 2 2- 2J5 . 

Since the constant C(a,f3, ||/|oo 5 15100) in (3-6) can be upper bounded by a 
constant C(a, j3, R"), taking 

J = Llog 2 n e \ + 1 

for some e > 0, the right hand side of (3.6) is upper bounded by 

C(a, (3, 5, 7 , R, R' , R") L~ 2 * s + D~ 2 "< + ^-^ j . 

Now, taking D = [(n/lnn) 1 '' 27 "^ 1 ' J, and e > 7/(5(27+ 1)), one obtains that 
when 5 < 7/2, then D < 2 J , and 

p(& 2 \B S7j7O0 (R,R',R"),l3) < C(a,p,6,j,R,R',R"^ (— V 
Since this upper bound also applies to \I/ Q , one has 
P {^ a ,B 5ili00 (R,R',R"),p) 



Inn 



')7l' 

<C(aJ,S^R,R',R")mf\(-^-y" m , (^ 

I \ In in n / V In n 

5.4. Proof of the lower bounds. We give here the arguments to derive 
from the results given in [18] lower bounds for the minimax separation 
rates over £>,5 j7j00 (-R, R, R"). As usual, we introduce a finite subset C of 
Bs,-(,oo(R, R' 1 R") , composed of couples of intensities which are particularly 
difficult to distinguish. Here one can use the finite subset of possible inten- 
sities Sm,d,t that has been defined in [18] Equation (6.4), and define 

C = {(/, 9), f = pl[o,i] an d g G S M ,D,r}i 

for some fixed positive p. Next the computations of the lower bounds of [18] 
can be completely reproduced once we remark that the likelihood ratio 

' .( N \N' 1 ) = f^(ATi) x ^MAT- 1 ), 



d'TpllOAl.pllOA] °^V>-[0,1] °^Pl[0,l] 
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where on the left hand side P/ i9 represents the joint distribution of two 
independent Poisson processes N 1 and N^ 1 , with respective intensities / 
and g, and on the right hand side P/ represents the distribution of one 
Poisson process with intensity /. This means that the likelihood ratios that 
have been considered in [18] are exactly the ones we need here to compute 
the expected lower bounds. The results are consequently identical. 

5.5. Proof of Corollary 2. Considering (3.8), we mainly have to find a 
sharp upper bound for 



(mi,m 2 )GA4 



?) 



w (m u m 2 ) J\\f + Slool/ + fflll W 5 



O— dm,2 



when (f,g) belongs to S 5 d (R,R',R"). 

Let us first control the bias term | (/ — g) — k mit h m * (/ — <?)| 2 - Plancherel's 

theorem gives that when (/ - g) e V-(R d ) n L 2 (M d ), 

(2n) d \\(f - g) - k muhm2 * (/ - g)f 

= \\(i-KZZ 2 )(f r g)\\ 2 

1 - kZ(2- m *u)\ 2 (u){f^g) 2 {u)dv u . 

Assume now that (/, g) € S d (R, R', R"), and take mj = minjmi € M-i, mi > 
5}. Note that since \\k m * j^ < +oo and k m * satisfies the condition (3.7), there 
also exists some constant C(5) > such that 

|1 — k m *(u)\ 
Ess sup ueRd \ {0} —g < C{8). 



Then 



\{f -g)- k ml , hm2 * (/ - 9)\\ 2 < ^3 I \\^ m2 u\\ 2 d s (f^g) 2 (u)du u , 



(2vr) d J Rd 
and since (/ — g) G S d (R), 

IK/ - g) - k m *, hm2 *(f-g)f< 2~ 25m2 C(5)R 
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Furthermore, |&; m *| 2 < C(5), so 



inf U(f-g)-k muhm2 *(f-g)\ 2 



+ 



W (m 1 ,m 2 ) /|/ + S|oo|/+S|l|fcmi| 2 



n 



2—drri2 



< C{8, a, 0, R) inf { 2~ 25m2 + 

m 2 &M2 



W, 



11 



(ml,m 2 ) \\f + gWoolf + g\\l 



2—dm,2 



Choosing 



leads to 



TOn 



log 2 



n \ d+45 



In In n 



2 -28m 2 <2 28 I hilnnV+« 



n 



and since W( m *,m*) < C(<5, d) lnlnn 



11 



2-dm* 



<C(8,d)yqTTgUjTgh 



48 

lnlnn\ d + 4S 



ii 



Noting that 1/n < (lnlnn/n) 45/(d+4<5) , when (f,g) E S 5 d (R,R',R") 



C(a,P){ inf {\\(f- g )-k muhm2 *(f 



?? 



^(mi,m 2 ) / 1/ + fflloo ||/ + ffll \\k mi \\ 2 I ||/ + gj 



9— dm2 



71 



lnlnnN d + 4iS 



?? 



<C(5,a,/3,R,R',R",d) 
This concludes the proof of Corollary 2. 



5.6. Proof of Corollary 3. As in the previous section, considering (3.9), 
we here have to find a sharp upper bound for 



■ f J,,/, >. , ft mi 2 , W (l,™ 2 ) /|/ + S'l|oo||/ + 5||l||A:i|| 2 

m2GA M n V nti>w< 
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Let us first evaluate ||(/ — g) — k\^ * (/ — g)\ 2 when (/ — g) € A/"^ rf (i?), and 

h = (hi,...,h d ). 

For x = (a?i, . . . ,x d ) € M d , let 6(x) = fc iift * (/ - g)(x) - (/ - g){x). Then 



6(x) = / fei(«i, . . . , u d )(f-g)(x 1 +u 1 hi, ..., x d +u d h d )du\ . . . du d -(f-g)(x), 
and since J" Rd fci(iti, . . . , u d )du\ . . . du d = 1, 



6(s) 



fci(tii,... ,u d ) {f - g)(xi +uihx,...,x d + u d h d ) 



-(f-g)(xi,...,x d ) 



du\ . . . du d 



i=l 

where for i = 1 . . . d, 



bi(x) = / h(ui, 

JR d 



,Ud) (/ -g){x\ + u 1 h 1 ,...,x i +u i hi,x i+1 ,...,x d ) 



(/ - g)(x 1 + uihi, ... , Xi,x i+1 , ...,x d ) 



du\ . . . du d , 



As in the proof of Proposition 1.5 p. 13 of [52], using the Taylor expansion 
°f (/ ~~ 9) i n the ith direction and the fact that J-^k\ t i{ui)u\dui = for 
j = 1 . . . Aj , we obtain that 



bi(x) 



f 

Je 1 



ki(u 



(L^J-i)! 



"(l_ r )L*iJ-i 



D \ (f ~ g)( x i + uihx, . . . ,Xi + TUihi,x i+ i, . . . ,x d )dr 



du, 



So, 



bi{x) 



ki(u)- 



ih,)^ 



"(l_ r )L*iJ-i 



(LftJ-i)i 

[d\ {f - g){x\ +uihi,...,Xi + TUihi,x i+ i,...,x d ) 



D \ (f ~ g)(xi + uihi, . . . ,x i} . . . ,Xd))dT 



du. 
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Hence, by using twice Lemma 1.1 p. 13 of [52] extended to the spaces M d x ' 
and R d x R d , 



Ml < 



\h(u)\ 



\uihi\&l 



(l_ T )L«iJ-i 



o 



D \ (/ - 9)(xi +uihi,...,Xi +TUihi,x i+1 ,...,x d ) 

\ 2 



< 



D \ (/ ~ 9)(%i + uih\, ...,Xi...,x d ) 
MM 



dr 



du dx 



\ki{u)\ 



(LAJ-i)' 



(1-r) 



[AM 



L Jo 



D \ (f ~ g)(%i +uihi,...,Xi +TUihi,x i+ i,...,x d ) 

\ !/ 2 



D \ (f ~ a)( x i +uihi,...,Xi...,x d ] 



dr 



dx du 



and 



J i II 2 



< 



l*i Ml 



\uihi\M 

{\aFw\Jo 



(l_ r )L*J-i 



D i (f ~ a)( x i +uihi,...,Xi +TUihi,x i+ i,...,x d ) 



D \ (f ~ g)(xi +Uihi,...,Xi...,x d ] 



1/2 \ 

dx) dr du 



When (/ -g) € /V$ d (R), 



hh < C(5i)R / |jfci(u)||«i^|**d« < C{5i) ( / |jfei(u)||ui| 5i du j fl/i. 



So, 



l&i.fc *(/-#) -(/ 



< 



c((y)u^/i 



5i 



Let us now find some rri2 in A^2 giving a sharp upper bound for 



w l(/ - ,) - h,„„„ * (/ - 9 )ll 2 + Siffl) / l/ + 'M/ + 8l'lfal' 
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log 2 



■/?? 



2,/ 



» 



In In n 



26 
«i(l+4«) 



for every i = 1 . . . d. Since h m * = (2 ""a. 1 , . . . , 2 
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IK/ - 5) - fci,v* *(/-»)!< W«) E 2 "^*' 



j=i 



so 



||(/-5)-A;i,K . *(f-g)f<C(5,R)c? 



lnlnn \ n- 4 <s 



\ n 
Moreover, it is easy to see that wn m *\ < C(5,d) lnln?i, and hence 



n 



+ g\\oo\\f + ghWhW 2 



nti 2- 



n V In In n 

AS 

In Inn \ 1 + 4 * 



< C{8,a,P,B!,B!',dft^ f^L_\^ a+« 



<C(5,a,P,R',R",d) 



n 



Since 



n \ n 
when In Inn > 1, this ends the proof of Corollary 3. 
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